Retroactive search of objects using k-d tree

ABSTRACT

In one embodiment, a method includes receiving a set of one or more content objects to be blacklisted; retrieving a set of currently blacklisted content objects; and determining a delta set of content objects that includes the content objects in the set of content objects to be blacklisted that are not included in the set of currently blacklisted content objects. Each of the content objects of the delta set is represented as a vector that includes a number of first elements. The method also includes retrieving, for each content object of a third set of content objects, a representation of the content object as a vector that includes a number of second elements; and identifying each content object in the third set whose content substantially matches at least one content object of the delta set.

PRIORITY

This application is a continuation under 35 U.S.C. §120 of U.S. patentapplication Ser. No. 13/599,162, filed 30 Aug. 2012.

TECHNICAL FIELD

This disclosure generally relates to retroactively searching for objectshaving specific contents.

BACKGROUND

In computer science, a k-dimensional tree, or k-d tree for short, is aspace-partitioning data structure for organizing data points in ak-dimensional space. k-d trees are a useful data structure for manyapplications, such as searches involving a multidimensional search key.

SUMMARY OF PARTICULAR EMBODIMENTS

In particular embodiments, a social-networking system may receive dataobjects from its users and store these data objects in the system. Forexample, a data object may be an image, which a user uploads to thesocial-networking system. Sometimes, the social-networking system mayidentify data objects having specific types of contents. For example,for policy reasons, the social-networking system may identify imageshaving pornographic, hateful, racist, dangerous, violent, or offensivecontents so that such undesirable or unsuitable images are not freelyshared among its users.

In particular embodiments, the social-networking system may maintain alist of data objects having specific types of contents. For example, thesocial-networking system may maintain a blacklist of images havingundesirable or unsuitable contents. This blacklist of images may beupdated as needed. New images with undesirable or unsuitable contentsmay be added to the blacklist as they become known to thesocial-networking system. Some images on the blacklist may be deleted(e.g., very old images).

In particular embodiments, the social-networking system may periodicallyconduct a retroactive search among those data objects already in thesystem to identify all the objects having specific types of contents.For example, since new images may be added to the blacklist or existingimages may be deleted from the blacklist, the social-networking systemmay periodically search through all the images in the system to identifythose images having undesirable contents based on the current blacklistof images. To do so, in particular embodiments, a delta blacklist ofimages may be constructed, which may include only the difference betweenthe two versions of the blacklist (e.g., new images added to theblacklist between the current time and the previous time when the lastsuch search was performed). The images on the delta blacklist are thencompared to all the images currently in the system. Images in the systemwhose contents substantially match the content of at least one image onthe delta blacklist may be identified.

In particular embodiments, to improve performance of the search process,the images in the social-networking system are stored in one or morek-dimensional trees (k-d tree for short). More specifically, each k-dtree is balanced. Each image on the delta blacklist is compared againsteach k-d tree to identify those images in the system whose contentsubstantially match the content of the image on the delta blacklist.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example network environment associated with asocial-networking system.

FIG. 2 illustrates an example social graph.

FIG. 3 illustrates an example method for conducting a retroactive searchamong images stored in a system to identify images having specific typesof contents.

FIG. 4 illustrates an example k-d tree storing 10 images.

FIG. 5 illustrates an example computer system.

DESCRIPTION OF EXAMPLE EMBODIMENTS

In particular embodiments, a system, such as a social-networking system,may conduct periodic searches to identify data objects stored in thesystem that have specific types of contents. For example, thesocial-networking system may receive data objects, such as images,videos, or texts, from its users and store these data objects in thesystem. However, for various policy reasons, some of these data objectsmay have contents that are considered undesirable or unsuitable to thesocial-networking system. For example, some images may have pornographic(especially child pornography), hateful, racist, dangerous, violent, oroffensive contents. The social-networking system may periodically searchthrough all the images stored therein to identify images with suchundesirable or unsuitable contents so that appropriate actions may betaken with respect to these images.

FIG. 1 illustrates an example network environment 100 associated with asocial-networking system. Network environment 100 includes a user 101, aclient system 130, a social-networking system 160, and a third-partysystem 170 connected to each other by a network 110. Although FIG. 1illustrates a particular arrangement of user 101, client system 130,social-networking system 160, third-party system 170, and network 110,this disclosure contemplates any suitable arrangement of user 101,client system 130, social-networking system 160, third-party system 170,and network 110. As an example and not by way of limitation, two or moreof client system 130, social-networking system 160, and third-partysystem 170 may be connected to each other directly, bypassing network110. As another example, two or more of client system 130,social-networking system 160, and third-party system 170 may bephysically or logically co-located with each other in whole or in part.Moreover, although FIG. 1 illustrates a particular number of users 101,client systems 130, social-networking systems 160, third-party systems170, and networks 110, this disclosure contemplates any suitable numberof users 101, client systems 130, social-networking systems 160,third-party systems 170, and networks 110. As an example and not by wayof limitation, network environment 100 may include multiple users 101,client system 130, social-networking systems 160, third-party systems170, and networks 110.

In particular embodiments, user 101 may be an individual (human user),an entity (e.g., an enterprise, business, or third-party application),or a group (e.g., of individuals or entities) that interacts orcommunicates with or over social-networking system 160. In particularembodiments, social-networking system 160 may be a network-addressablecomputing system hosting an online social network. Social-networkingsystem 160 may generate, store, receive, and transmit social-networkingdata, such as, for example, user-profile data, concept-profile data,social-graph information, or other suitable data related to the onlinesocial network. Social-networking system 160 may be accessed by theother components of network environment 100 either directly or vianetwork 110. In particular embodiments, social-networking system 160 mayinclude an authorization server that allows users 101 to opt in or optout of having their actions logged by social-networking system 160 orshared with other systems (e.g., third-party systems 170), such as, forexample, by setting appropriate privacy settings. In particularembodiments, third-party system 170 may be a network-addressablecomputing system that can host various functions. Third-party system 170may be accessed by the other components of network environment 100either directly or via network 110. In particular embodiments, one ormore users 101 may use one or more client systems 130 to access, senddata to, and receive data from social-networking system 160 orthird-party system 170. Client system 130 may access social-networkingsystem 160 or third-party system 170 directly, via network 110, or via athird-party system. As an example and not by way of limitation, clientsystem 130 may access third-party system 170 via social-networkingsystem 160. Client system 130 may be any suitable computing device, suchas, for example, a personal computer, a laptop computer, a cellulartelephone, a smartphone, or a tablet computer.

This disclosure contemplates any suitable network 110. As an example andnot by way of limitation, one or more portions of network 110 mayinclude an ad hoc network, an intranet, an extranet, a virtual privatenetwork (VPN), a local area network (LAN), a wireless LAN (WLAN), a widearea network (WAN), a wireless WAN (WWAN), a metropolitan area network(MAN), a portion of the Internet, a portion of the Public SwitchedTelephone Network (PSTN), a cellular telephone network, or a combinationof two or more of these. Network 110 may include one or more networks110.

Links 150 may connect client system 130, social-networking system 160,and third-party system 170 to communication network 110 or to eachother. This disclosure contemplates any suitable links 150. Inparticular embodiments, one or more links 150 include one or morewireline (such as for example Digital Subscriber Line (DSL) or Data OverCable Service Interface Specification (DOCSIS)), wireless (such as forexample Wi-Fi or Worldwide Interoperability for Microwave Access(WiMAX)), or optical (such as for example Synchronous Optical Network(SONET) or Synchronous Digital Hierarchy (SDH)) links. In particularembodiments, one or more links 150 each include an ad hoc network, anintranet, an extranet, a VPN, a LAN, a WLAN, a WAN, a WWAN, a MAN, aportion of the Internet, a portion of the PSTN, a cellulartechnology-based network, a satellite communications technology-basednetwork, another link 150, or a combination of two or more such links150. Links 150 need not necessarily be the same throughout networkenvironment 100. One or more first links 150 may differ in one or morerespects from one or more second links 150.

Social-networking system 160 may store various types of data. Inparticular embodiments, such data may be stored in a graph having anynumber of nodes and edges, where each edge connects two nodes. The graphis often referred to as a “social graph” as it contains, among others,social information.

FIG. 2 illustrates example social graph 200. In particular embodiments,social-networking system 160 may store one or more social graphs 200 inone or more data stores. In particular embodiments, social graph 200 mayinclude multiple nodes—which may include multiple user nodes 202 ormultiple concept nodes 204—and multiple edges 206 connecting the nodes.Example social graph 200 illustrated in FIG. 2 is shown, for didacticpurposes, in a two-dimensional visual map representation. In particularembodiments, a social-networking system 160, client system 130, orthird-party system 170 may access social graph 200 and relatedsocial-graph information for suitable applications. The nodes and edgesof social graph 200 may be stored as data objects, for example, in adata store (such as a social-graph database). Such a data store mayinclude one or more searchable or queryable indexes of nodes or edges ofsocial graph 200.

In particular embodiments, a user node 202 may correspond to a user ofsocial-networking system 160. As an example and not by way oflimitation, a user may be an individual (human user), an entity (e.g.,an enterprise, business, or third-party application), or a group (e.g.,of individuals or entities) that interacts or communicates with or oversocial-networking system 160. In particular embodiments, when a userregisters for an account with social-networking system 160,social-networking system 160 may create a user node 202 corresponding tothe user, and store the user node 202 in one or more data stores. Usersand user nodes 202 described herein may, where appropriate, refer toregistered users and user nodes 202 associated with registered users. Inaddition or as an alternative, users and user nodes 202 described hereinmay, where appropriate, refer to users that have not registered withsocial-networking system 160. In particular embodiments, a user node 202may be associated with information provided by a user or informationgathered by various systems, including social-networking system 160. Asan example and not by way of limitation, a user may provide his or hername, profile picture, contact information, birth date, sex, maritalstatus, family status, employment, education background, preferences,interests, or other demographic information. In particular embodiments,a user node 202 may be associated with one or more data objectscorresponding to information associated with a user. In particularembodiments, a user node 202 may correspond to one or more webpages orone or more user-profile pages (which may be webpages).

In particular embodiments, a concept node 204 may correspond to aconcept. As an example and not by way of limitation, a concept maycorrespond to a place (such as, for example, a movie theater,restaurant, landmark, or city); a website (such as, for example, awebsite associated with social-network system 160 or a third-partywebsite associated with a web-application server); an entity (such as,for example, a person, business, group, sports team, or celebrity); aresource (such as, for example, an audio file, video file, digitalphoto, text file, structured document, or application) which may belocated within social-networking system 160 or on an external server,such as a web-application server; real or intellectual property (suchas, for example, a sculpture, painting, movie, game, song, idea,photograph, or written work); a game; an activity; an idea or theory;another suitable concept; or two or more such concepts. A concept node204 may be associated with information of a concept provided by a useror information gathered by various systems, including social-networkingsystem 160. As an example and not by way of limitation, information of aconcept may include a name or a title; one or more images (e.g., animage of the cover page of a book); a location (e.g., an address or ageographical location); a website (which may be associated with a URL);contact information (e.g., a phone number or an email address); othersuitable concept information; or any suitable combination of suchinformation. In particular embodiments, a concept node 204 may beassociated with one or more data objects corresponding to informationassociated with concept node 204. In particular embodiments, a conceptnode 204 may correspond to a webpage.

In particular embodiments, a node in social graph 200 may represent orbe represented by a webpage (which may be referred to as a “profilepage”). Profile pages may be hosted by or accessible tosocial-networking system 160. Profile pages may also be hosted onthird-party websites associated with a third-party server 170. As anexample and not by way of limitation, a profile page corresponding to aparticular external webpage may be the particular external webpage andthe profile page may correspond to a particular concept node 204.Profile pages may be viewable by all or a selected subset of otherusers. As an example and not by way of limitation, a user node 202 mayhave a corresponding user-profile page in which the corresponding usermay add content, make declarations, or otherwise express himself orherself. As another example and not by way of limitation, a concept node204 may have a corresponding concept-profile page in which one or moreusers may add content, make declarations, or express themselves,particularly in relation to the concept corresponding to concept node204.

In particular embodiments, a concept node 204 may represent athird-party webpage or resource hosted by a third-party system 170. Thethird-party webpage or resource may include, among other elements,content, a selectable or other icon, or other inter-actable object(which may be implemented, for example, in JavaScript, AJAX, or PHPcodes) representing an action or activity. As an example and not by wayof limitation, a third-party webpage may include a selectable icon suchas “like,” “check in,” “eat,” “recommend,” or another suitable action oractivity. A user viewing the third-party webpage may perform an actionby selecting one of the icons (e.g., “eat”), causing a client system 130to transmit to social-networking system 160 a message indicating theuser's action. In response to the message, social-networking system 160may create an edge (e.g., an “eat” edge) between a user node 202corresponding to the user and a concept node 204 corresponding to thethird-party webpage or resource and store edge 206 in one or more datastores.

In particular embodiments, a pair of nodes in social graph 200 may beconnected to each other by one or more edges 206. An edge 206 connectinga pair of nodes may represent a relationship between the pair of nodes.In particular embodiments, an edge 206 may include or represent one ormore data objects or attributes corresponding to the relationshipbetween a pair of nodes. As an example and not by way of limitation, afirst user may indicate that a second user is a “friend” of the firstuser. In response to this indication, social-networking system 160 maytransmit a “friend request” to the second user. If the second userconfirms the “friend request,” social-networking system 160 may createan edge 206 connecting the first user's user node 202 to the seconduser's user node 202 in social graph 200 and store edge 206 associal-graph information in one or more of data stores 24. In theexample of FIG. 2, social graph 200 includes an edge 206 indicating afriend relation between user nodes 202 of user “A” and user “B” and anedge indicating a friend relation between user nodes 202 of user “C” anduser “B.” Although this disclosure describes or illustrates particularedges 206 with particular attributes connecting particular user nodes202, this disclosure contemplates any suitable edges 206 with anysuitable attributes connecting user nodes 202. As an example and not byway of limitation, an edge 206 may represent a friendship, familyrelationship, business or employment relationship, fan relationship,follower relationship, visitor relationship, subscriber relationship,superior/subordinate relationship, reciprocal relationship,non-reciprocal relationship, another suitable type of relationship, ortwo or more such relationships. Moreover, although this disclosuregenerally describes nodes as being connected, this disclosure alsodescribes users or concepts as being connected. Herein, references tousers or concepts being connected may, where appropriate, refer to thenodes corresponding to those users or concepts being connected in socialgraph 200 by one or more edges 206.

In particular embodiments, an edge 206 between a user node 202 and aconcept node 204 may represent a particular action or activity performedby a user associated with user node 202 toward a concept associated witha concept node 204. As an example and not by way of limitation, asillustrated in FIG. 2, a user may “like,” “attended,” “played,”“listened,” “cooked,” “worked at,” or “watched” a concept, each of whichmay correspond to a edge type or subtype. A concept-profile pagecorresponding to a concept node 204 may include, for example, aselectable “check in” icon (such as, for example, a clickable “check in”icon) or a selectable “add to favorites” icon. Similarly, after a userclicks these icons, social-networking system 160 may create a “favorite”edge or a “check in” edge in response to a user's action correspondingto a respective action. As another example and not by way of limitation,a user (user “C”) may listen to a particular song (“Imagine”) using aparticular application (SPOTIFY, which is an online music application).In this case, social-networking system 160 may create a “listened” edge206 and a “used” edge (as illustrated in FIG. 2) between user nodes 202corresponding to the user and concept nodes 204 corresponding to thesong and application to indicate that the user listened to the song andused the application. Moreover, social-networking system 160 may createa “played” edge 206 (as illustrated in FIG. 2) between concept nodes 204corresponding to the song and the application to indicate that theparticular song was played by the particular application. In this case,“played” edge 206 corresponds to an action performed by an externalapplication (SPOTIFY) on an external audio file (the song “Imagine”).Although this disclosure describes particular edges 206 with particularattributes connecting user nodes 202 and concept nodes 204, thisdisclosure contemplates any suitable edges 206 with any suitableattributes connecting user nodes 202 and concept nodes 204. Moreover,although this disclosure describes edges between a user node 202 and aconcept node 204 representing a single relationship, this disclosurecontemplates edges between a user node 202 and a concept node 204representing one or more relationships. As an example and not by way oflimitation, an edge 206 may represent both that a user likes and hasused at a particular concept. Alternatively, another edge 206 mayrepresent each type of relationship (or multiples of a singlerelationship) between a user node 202 and a concept node 204 (asillustrated in FIG. 2 between user node 202 for user “E” and conceptnode 204 for “SPOTIFY”).

In particular embodiments, social-networking system 160 may create anedge 206 between a user node 202 and a concept node 204 in social graph200. As an example and not by way of limitation, a user viewing aconcept-profile page (such as, for example, by using a web browser or aspecial-purpose application hosted by the user's client system 130) mayindicate that he or she likes the concept represented by the conceptnode 204 by clicking or selecting a “Like” icon, which may cause theuser's client system 130 to transmit to social-networking system 160 amessage indicating the user's liking of the concept associated with theconcept-profile page. In response to the message, social-networkingsystem 160 may create an edge 206 between user node 202 associated withthe user and concept node 204, as illustrated by “like” edge 206 betweenthe user and concept node 204. In particular embodiments,social-networking system 160 may store an edge 206 in one or more datastores. In particular embodiments, an edge 206 may be automaticallyformed by social-networking system 160 in response to a particular useraction. As an example and not by way of limitation, if a first useruploads a picture, watches a movie, or listens to a song, an edge 206may be formed between user node 202 corresponding to the first user andconcept nodes 204 corresponding to those concepts. Although thisdisclosure describes forming particular edges 206 in particular manners,this disclosure contemplates forming any suitable edges 206 in anysuitable manner.

Users 101 of social-networking system 160 may upload data objects, suchas images, videos, or texts, to social-networking system 160 to bestored therein. Each data object may be represented by a specific nodein social graph 200. Furthermore, an edge may connect the noderepresenting the data object and the node representing the useruploading the data object.

In particular embodiments, when a data object is uploaded tosocial-networking system 160, social-networking system 160 may verifythe content of the data object to ensure that it does not containundesirable or unsuitable content. For example, when an image isuploaded to social-networking system 160, social-networking system 160may verify that this image does not have pornographic, hateful, racist,dangerous, violent, or offensive content. In this case,social-networking system 160 may maintain a blacklist of known imageshaving undesirable or unsuitable contents. When an image is uploaded tosocial-networking system 160, the content of this image is compared tothe contents of the images on the blacklist. If the content of thisimage substantially matches the content of any images on the blacklist,social-networking system 160 may take appropriate actions with respectto this image (e.g., block this image).

Of course, the images on the blacklist are not necessarily undesirableimages always. In fact, in particular embodiments, the images on theblacklist may be divided into categories. Alternatively, a separate listmay be created for each category of images. One category may includepornographic images. Another category may include violent images.However, a third category may include images of world-famous monuments,while a fourth category may include images of celebrities. In thissense, the images on the blacklist are those having specific types ofcontents of particular interest to, for example, social-networkingsystem 160. By comparing images with images on the blacklist, imageshaving specific types of contents may be identifies. With someimplementations, social-networking system 160 may maintain a blacklistof images as well as a whitelist of images. In this case, the images onthe blacklist may have undesirable contents, and the images on thewhitelist do not necessarily have undesirable contents but may havecertain types of contents of particular interests to social-networkingsystem 160 or its users. Either list may be used to identify images inthe system having various types of contents.

In particular embodiments, when comparing images with images on theblacklist, only a specific category or categories of images on theblacklist may be used. For example, if the purpose is to locateundesirable images in social-networking system 160, the system may onlycompare images against those images on the blacklist that belong tocategories such as pornographic, racist, and violent images. On theother hand, if the purpose is to locate all the images of sportcelebrities, the system may only compare images against those images onthe blacklist that belong to the sport celebrity category. This way,images having a specific type of content may be identified by comparingimages with the appropriate categories of images on the blacklist.

In particular embodiments, an image-matching algorithm may be used tocompare two images. Some image-matching algorithms are capable ofperforming a fuzzy, instead of exact, match between two images. Examplesof such image-matching algorithms include, but not limited to, DiscreteWavelet Transform (DWT) based image hash, hashing via Singular valueDecomposition (SVD), and feature point based image hashing. Withparticular implementations, an image-matching technology called PhotoDNAdeveloped by Microsoft Inc. may be employed to compare the contents oftwo specific images. Briefly, given an image (e.g., a JPEG file of adigital photograph), PhotoDNA generates a 144-element vector (i.e., avector having 144 elements) representing the content of the image. Eachelement in the vector is 1 byte. This set of 144 elements (i.e., the144-element vector) is also referred to as the “hash” of the image.Since these elements represent the content of an image, they areessentially the “fingerprint” of the image. To compare the contents oftwo specific images, image X and image Y, a set of 144 elements, x₁ . .. x₁₄₄, is generated for image X, and a set of 144 elements, y₁ . . .y₁₄₄, is generated for image Y. The proximity measurement between imageX and image Y may then be computed as

$\sum\limits_{i = 1}^{144}\; {( {x_{i} - y_{i}} )^{2}.}$

The contents of image X and image Y are considered substantially thesame (i.e., matching) if the proximity measurement between image X andimage Y is less than a predefined threshold.

The advantage of some image-matching algorithms (e.g., PhotoDNA) is thatthe technology performs a fuzzy match, instead of an exact match, of thecontents of two images. In other words, it is not necessary for the twoimages to have exactly the same content in order for such animage-matching algorithm to find a match. Instead, even when there areslight variations between the two images (e.g., one image is slightlycropped from the other image, one image is slightly larger than theother image, or one image has an extra element not found in the otherimage), the image-matching algorithm can still find a match if thevariations are sufficiently minor. How much variation can be toleratedis controlled by the threshold value. The larger the threshold value,the more variation tolerance, and vice versa.

Using a suitable image-matching algorithm (e.g., PhotoDNA), a set ofelements (i.e., the hash) is generated for each image on the blacklistmaintained by social-networking system 160. Then, when an image isuploaded to social-networking system 160, a set of elements (i.e., thehash) is also generated for this image. This image is then compared witheach image on the blacklist by computing the proximity measurementbetween this image and each image on the blacklist using theirrespective sets of elements. If the proximity measurement between thisimage and an image on the blacklist is sufficiently small (i.e., lessthan a predefined threshold value), then the content of this image isconsidered to substantially match the content of that image on theblacklist.

In particular embodiments, social-networking system 160 may update theblacklist of images from time to time. For example, as additional imageswith undesirable or unsuitable contents become known tosocial-networking system 160 (e.g., through user reporting), theseimages may be added to the blacklist. Thus, the blacklist of images mayexpand as time passes.

For example, suppose that at time t₁, the blacklist contains 1000images. An image, image X, uploaded to social-networking system 160 atthis time is compared to these 1000 images to verify whether it containsundesirable content (e.g., whether the content of image X substantiallymatches the content of any of the 1000 images currently on theblacklist). Further suppose that no match is found. Image X is thenstored in social-networking system 160. At time t₂ (some time after timet₁), new undesirable images have become known to social-networkingsystem 160 and have been added to the blacklist. Suppose that at timet₂, the blacklist now contains 1100 images (i.e., 100 images have beenadded to the blacklist between time t₁ and time t₂). Another image,image Y, uploaded to social-networking system 160 at this time is thuscompared to these 1100 images to verify whether it contains undesirablecontent (e.g., whether the content of image Y substantially matches thecontent of any of the 1100 images currently on the blacklist). However,those images uploaded to social-networking system 160 before time t₂ andalready stored in the system, including image X, have never beencompared to the 100 images newly added to the blacklist between time t₁and time t₂ since these 100 images were not on the blacklist when thoseimages already stored in the system were uploaded.

In particular embodiments, social-networking system 160 may conductperiodic retroactive searches among all the images stored in the systemto ensure that the images stored in the system are compared against thecurrent blacklist of images. For example, the search may be conductedonce every 8 hours, once a day, or once a week. How often the searchneeds to be conduced may depend on how frequently images are uploaded tothe system or added to the blacklist.

FIG. 3 illustrates an example method 300 for conducting a retroactivesearch. Suppose that the search is conducted periodically at times t₁,t₂, t₃, and so on. In this case, for example, at time t₁, the images inthe system are compared against the blacklist of images currentlyavailable at time t₁ (e.g., 1000 images). At time t₂, suppose that theblacklist now contains 1100 images, with 100 images added to theblacklist between time t₁ and time t₂. However, among these 1100 imageson the blacklist, the images in the system have already been compared tothe 1000 images on the blacklist during the previous search at time t₁.Only those 100 images newly added to the blacklist between time t₁ andtime t₂ have not been verified during the previous search at time t₁.Thus, during the search performed at time t₂, it is not necessary tocompare the images in the system with all 1100 images currently on theblacklist. Instead, the images in the system only need to be comparedwith the 100 images recently added to the blacklist between time t₁ andtime t₂. Similarly, at time t₃, suppose that the blacklist now contains1250 images, with 150 images added to the blacklist between time t₂ andtime t₃. Again, the images in the system have already been compared tothe 1100 images during the previous searches at time t₁ and time t₂.Only the 150 images newly added to the blacklist between time t₂ andtime t₃ have not been verified. Thus, during the search performed attime t₃, the images in the system only need to be compared with the 150images recently added to the blacklist between time t₂ and time t₃. Inother words, the search only needs to be performed incrementally eachtime.

In particular embodiments, the steps illustrated in FIG. 3 may berepeated each time a retroactive search is conducted. During eachsearch, at STEP 310, a delta blacklist may be constructed thatrepresents the difference between the version of the blacklist at thetime of the current search and the time when the immediately previoussearch was conducted (e.g., the delta blacklist may contain the imagesadded to the blacklist between the time of the current search and thetime when the immediately previous search was conducted). For example,suppose that the current search is conducted at time t₂ and theimmediately previous search was conducted at time t₁. At time t₁, theblacklist contains 1000 images. Between time t₁ and time t₂, 100 imageshave been added to the blacklist so that at time t₂, the blacklistcontains 1100 images. In this case, the delta blacklist would containthe 100 images added to the blacklist between time t₁ and time t₂.

At STEP 320, the delta blacklist is compared with all the imagescurrently stored in, for example, social-networking system 160. Asdescribed above, with specific implementations, image comparison may beperformed using PhotoDNA. A hash is generated for each image on theblacklist as well as for each image stored in social-networking system160. Image hashing (i.e., generating a set of elements representing thecontent of an image) may be performed at the time an image is uploadedto social-networking system 160 or added to the blacklist. Instead of orin addition to storing the images themselves, the hashes of the imagesmay be stored. Subsequently, the hashes of the images may be retrievedfrom memory storage for image comparison whenever needed (e.g.,computing the proximity measurements between an image on the deltablacklist and an image in the system).

At STEP 330, each image stored in social-networking system 160 whosecontent substantially matches the content of at least one image on thedelta blacklist is identified. Again, with specific implementations, tocompare an image stored in social-networking system 160 with an image onthe delta blacklist, the proximity measurement between these two imagesmay be computed using their respective hashes. If the proximitymeasurement is less than a predefined threshold, then the content of theimage stored in social-networking system 160 is considered tosubstantially match that of the image on the delta blacklist.Social-networking system 160 may then take appropriate actions withrespect to such an image (e.g., removing the image fromsocial-networking system 160).

Particular embodiments may repeat one or more steps of the method ofFIG. 3, where appropriate. Although this disclosure describes andillustrates particular steps of the method of FIG. 3 as occurring in aparticular order, this disclosure contemplates any suitable steps of themethod of FIG. 3 occurring in any suitable order. Moreover, althoughthis disclosure describes and illustrates particular components,devices, or systems carrying out particular steps of the method of FIG.3, this disclosure contemplates any suitable combination of any suitablecomponents, devices, or systems carrying out any suitable steps of themethod of FIG. 3.

In practice, there may be millions or even billions of images stored ina system, such as social-networking system 160. To perform a completesearch of such a great number of images can take a very long time. Toimprove performance, in particular embodiments, the images stored insocial-networking system 160 may be stored in one or more k-d trees, andmore specifically, balanced k-d trees. Note that well nodes can be addedto or deleted from an existing k-d tree, frequent additions or deletionsmay cause a balanced k-d tree to become imbalanced.

An image-matching algorithm, such as PhotoDNA, is capable of generatinga set of 144 elements for an image that represents the content of theimage. These 144 elements may be considered the “fingerprint” of theimage. In addition, PhotoDNA is also capable of generating a reduced setof 16 elements for an image that represents, for example, the main orkey content features of the image (i.e., the reduced “fingerprint” ofthe image). The set of 16 elements generated for each image may be usedto partition a set of images into a k-d tree.

To further explain, consider a specific example illustrated in FIG. 4.Suppose that there are 10 images, image A to J, to be partitioned into ak-d tree. Note that a small number of images are used to illustrate theprocess while simplifying the discussion. In practice, the process maybe similarly applied to any number of images.

For image A, PhotoDNA may generate a set of 16 elements, a₁ . . . a₁₆,as well as a set of 144 elements, a₁ . . . a₁₄₄. For image B, PhotoDNAmay generate a set of 16 elements, b₁ . . . b₁₆, as well as a set of 144elements, b₁ . . . b₁₄₄. For image C, PhotoDNA may generate a set of 16elements, c₁ . . . c₁₆, as well as a set of 144 elements, c₁ . . . c₁₄₄.And so on.

The 10 images may be spatially partitioned into the k-d tree based ontheir respective sets of 16 elements. At level 1 of the k-d tree, the 10images are sorted according to their respective first one of the 16elements (i.e., the first element from each 16-element vector as a₁, b₁,c₁ . . . j₁). Suppose that the 10 images are sorted as C, J, G, A, D, F,I, B, E, H. The image in the middle or the median image, image D, isstored in node 410 at level 1 of the tree. Those images to the left ofimage D, images C, J, G, and A, are stored in the left sub-tree of node410, while those images to the right of image D, images F, I, B, E, andH, are stored in the right sub-tree of node 410.

At level 2 of the k-d tree, there are two nodes 421 and 422. For node421 (i.e., the left child node of node 410), images C, J, G, and A aresorted according to their respective second one of the 16 elements(i.e., the second element from each 16-element vector as c₂, j₂, g₂,a₂). Suppose that these 4 images are now sorted as J, A, C, G. Themedian image, image A, is stored in node 421 at level 2. The image tothe left of image A, images J, is stored in the left sub-tree of node421, while those images to the right of image A, images C and G, arestored in the right sub-tree of node 421. For node 422 (i.e., the rightchild node of node 410), again, images F, I, B, E, and H are sortedaccording to their respective second one of the 16 elements (i.e., f₂,i₂, b₂, e₂, h₂). Suppose that these 5 images are now sorted as F, I, H,B, E. The image in the middle, image H, is stored in node 422 at level2. The images to the left of image H, images F and I, are stored in theleft sub-tree of node 422, while those images to the right of image H,images B and E, are stored in the right sub-tree of node 422.

At level 3 of the k-d tree, the left sub-tree of node 421 only has oneimage, image J. Thus, there is no need to sort anymore. Image J isstored in node 431, which is the left child node of node 421.

For node 432, which is the right child node of node 421, images G and Care sorted according to their respective third one of the 16 elements(i.e., the third element from each 16-element vector as g₃, c₃). Supposethat these 2 images are now sorted as G, C. The median image, image G,is stored in node 432. There is no image to the left of image G and thusno left sub-tree. The image to the right of image G, image C, is storedin the right sub-tree of node 432.

For node 433, which is the left child node of node 422, images F and Iare sorted according to their respective third one of the 16 elements(i.e., f₃, i₃). Suppose that these 2 images are now sorted as F, I. Themedian image, image F, is stored in node 433. There is no image to theleft of image F and thus no left sub-tree. The image to the right ofimage F, image I, is stored in the right sub-tree of node 433.

For node 434, which is the right child node of node 422, images B and Eare sorted according to their respective third one of the 16 elements(i.e., b₃, e₃). Suppose that these 2 images are now sorted as E, B. Themedian image, image E, is stored in node 434. There is no image to theleft of image E and thus no left sub-tree. The image to the right ofimage E, image B, is stored in the right sub-tree of node 434.

At level 4 of the k-d tree, node 441 is the right child node of node432. The right sub-tree of node 432 only has one image, image C. Thereis no need to sort at this point, and image C is stored in node 441.Similarly, image I is stored in node 442, which is the right child nodeof node 433, and image B is stored in node 443, which is the right childnode of node 434.

To generalize the example illustrated in FIG. 4, a set of images may bepartitioned into a k-d tree level by level. For each image in the set, aset of k elements may be generated. Note that although the exampleillustrated in FIG. 4 uses 16 as a specific value for k, k may be set toany suitable value. At each level i, given a specific node at level i,the sub-set of images belonging to this portion of the tree is sortedaccording to their respective i^(th) one of the k elements. The medianimage is then stored in the node. The images to the left of the medianimage, if any, are stored in the left sub-tree of the node, and theimages to the right of the median image, if any, are stored in the rightsub-tree of the node.

If the tree has more than k levels, the sorting of the images repeatsthe cycle of k elements. Thus, at level k+1, the first one of the kelements is used again to sort the images; at level k+2, the second oneof the k elements is used again to sort the images; and so on, until allthe images in the set are partitioned into the tree. In other words, ateach level i, the (i mod k)th element is used to sort the images, whenappropriate.

This process ensures that the resulting tree is balanced. In particularembodiments, each image may have a unique identifier. At each node, theidentifier, the set of 16 elements, and the set of 144 elements of thecorresponding image are stored. With specific implementations, given aspecific image, its identifier, the set of 16 elements, and the set of144 elements may be stored in a block of memory. An index (e.g., amemory reference pointer) may indicate the beginning address of thatblock of memory. This index may be stored in the corresponding node ofthe k-d tree.

In practice, social-networking system 160 may have billions of imagesuploaded by its users. Instead of storing all of these images in one k-dtree, the images may be divided and stored in multiple k-d trees. Withsome implementations, each k-d tree may be used to store 2^(n)−1 images,where n may be some positive integer. The advantage of suchimplementation is that when comparing the delta blacklist of images withthe images in the system during a retroactive search, the comparison maybe performed with respect to multiple k-d trees in parallel.

Furthermore, storing 2^(n)−1 images in each k-d tree ensures that theresulting k-d tree is balanced. For example, consider the imagepartitioning process described above. In the first partitioning step,the set of 2^(n)−1 vectors (each vector corresponding to one image andincludes k elements representing that image) gets partitioned into twosets of 2^(n-1)−1 vectors and a median vector. This assures that bothsub-trees of the new median node, created out of the median vector, havethe same number of nodes. The second partitioning step takes each set of2^(n-1)−1 vectors and partitions it into two sets of 2^(n-2)−1 vectorsand a median vector. This process continues until eventually there are 3vectors that get partitioned into 2 vectors, which become two leafnodes, and a median vector, which becomes a median node. Thus, eachpartitioning step guarantees that the resulting sub-trees are of equalsizes. Consequently, the choice of 2^(n)−1 as the number of nodes (i.e.,corresponding to images) to be stored in each k-d tree results in aperfectly balanced k-d tree. For example, when n=21,2²¹−1=2097152−1=2097151 nodes, or approximately 2 million nodes, arestored in each k-d tree.

For example, suppose that there are n computing devices available forperforming the comparison. Once the delta blacklist of images isconstructed, each of the n computing devices may obtain a copy of thedelta blacklist. Then each of the n computing devices may obtain a copyof a different k-d tree (e.g., by performing a memory copy) and thencomparing the blacklist against that copy of the k-d tree. As soon as acomputing device finishes processing its copy of the k-d tree, it canobtain a copy of another k-d tree that has not been processed. This maycontinue until all the k-d trees in the system have been processed(i.e., compared to the delta blacklist).

Conducting search through a k-d tree is faster than a straightforwardcomparison. For example, suppose that an image, image X (e.g., image Xmay be on the delta blacklist), is to be compared with the 10 images inthe above example (e.g., images A-J may be images stored in a system),which have already been stored in a k-d tree as illustrated in FIG. 4.Without using a k-d tree, image X needs to be compared with each andevery one of the 10 images. This requires 10 comparisons (e.g.,computing 10 proximity measurements respectively between image X andeach of the 10 images). However, using a k-d tree, image X only needs tobe compared with some of the 10 images, but not necessarily all of the10 images. The comparison algorithm traverses the k-d tree downrecursively level by level, starting from the root node. At each level,the sub-tree that does not need to be searched is discarded.

With specific implementations, each image on the blacklist may also haveits own set of 16 elements and set of 144 elements generated usingPhotoDNA. Thus, for image X, PhotoDNA may generate a set of 16 elements,x₁ . . . x₁₆, and a set of 144 elements, x₁ . . . x₁₄₄.

To further explain the recursive comparison algorithm, consider thespecific case of comparing image X with image D. In the exampleillustrated in FIG. 4, image D is stored in node 410, which is the rootnode of the k-d tree. First, the proximity measurement between images Xand D using their respective sets of 16 elements is computed (e.g.,proximity measurement

$ {1 = {\sum\limits_{i = 1}^{16}\; ( {x_{i} - d_{i}} )^{2}}} ).$

If a potential match is found (e.g., the first proximity measurementcomputed using the 16 elements is less than a threshold value), thenimages X and D are compared again using their respective sets of 144elements (e.g., proximity measurement

$ {2 = {\sum\limits_{i = 1}^{144}\; ( {x_{i} - d_{i}} )^{2}}} )$

to further confirm the match. This way, whether two images substantiallymatch in content may be determined quickly because in most cases, onlythe first proximity measurement may need to be computed, and computingthe first proximity measurement is faster than computing the secondproximity measurement (i.e., 16 elements vs. 144 elements). If thesecond proximity measurement computed using the 144 elements is alsoless than the threshold value, this means that a match is found (i.e.,the content of image D substantially matches the content of image X).The search of this k-d tree can end at this point (i.e., there is noneed to compare image X with the other images also stored in thisparticular k-d tree), and image D may be identified.

On the other hand, if either the first or the second proximitymeasurement between images X and D is greater than or equal to thethreshold, this means that the content of image D does not match that ofimage X. The search algorithm decides whether to search the leftsub-tree or the right sub-tree of node 410. The element used to sort theimages at this level during the construction of the k-d tree is used.Since node 410 is at level 1 of the k-d tree, the first element of the16 elements should be used. (For level 2, the second element from the16-element vector should be used. For level 3, the third element fromthe 16-element vector should be used. And so on.) Each image has acorresponding 16-element vector. For image X, the first element from theset of 16 elements (i.e., the 16-element vector) is x₁. For image D, thefirst element from the set of 16 elements (i.e., the 16-element vector)is d₁. Thus, x₁ and d₁ are compared.

If the square of the difference between x₁ and d₁, (x₁−d₁)², is greaterthan the threshold value, then one of the sub-trees of node 410 (i.e.,image D) may be eliminated. In this case, if x₁≦d₁, then all the nodesin the right sub-tree of node 410, which correspond to images H, F, E,I, and B, will each have a proximity measurement value with image X thatis greater than the threshold. For example, image H with the hash h₁ . .. h₁₆ will have a proximity measurement value with image X as

$\sum\limits_{i = 1}^{16}{( {x_{i} - h_{i}} )^{2}.}$

Since h₁>d₁≧x₁, it means (x₁−h₁)²>(x₁−d₁)². Hence, the proximitymeasurement between images H and X will definitely be higher than thethreshold, and so there is no need to compare image X with image H. Thesame reasoning applies to images F, E, I, and B (i.e., all the images inthe right sub-tree of node 410). Therefore, the right sub-tree of node410 (i.e., image D) may be ignored if x₁≦d₁. Instead, the recursivealgorithm proceeds down to the left sub-tree of node 410. In this case,the image to be compared during the next recursive iteration is image Aat node 421, which is the left child of node 410.

On the other hand, if x₁>d₁, then all the nodes in the left sub-tree ofnode 410, which correspond to images A, J, G, and C, will each have aproximity measurement value with image X that is greater than thethreshold. For example, image A with the hash a₁ . . . a₁₆ will have aproximity measurement value with image X as

$\sum\limits_{i = 1}^{16}{( {x_{i} - a_{i}} )^{2}.}$

Since a₁≦d₁<x₁, it means (x₁−a₁)²>(x₁−d₁)². Hence, the proximitymeasurement between images A and X will definitely be higher than thethreshold, and so there is no need to compare image X with image A.Again, the same reasoning applies to images J, G, and C (i.e., all theimages in the left sub-tree of node 410). Therefore, the left sub-treeof node 410 (i.e., image D) may be ignored if x₁>d₁. Instead, therecursive algorithm proceeds down to the right sub-tree of node 410. Inthis case, the image to be compared during the next recursive iterationis image H at node 422, which is the right child of node 410.

If the square of the difference between x₁ and d₁, (x₁−d₁)², is lessthan or equal to the threshold value, then both the left and the rightsub-trees of node 410 (i.e., image D) need to be searched. In this case,the recursive algorithm proceeds down to node 421 and compares image Xwith image A, as well as proceeds down to node 422 and compares image Xwith image H. With some implementations, the two comparisons may beperformed in parallel (e.g., applying appropriate parallel processing ormulti-threading techniques).

If the current node (i.e., node 410 in this case) has no child node, therecursion may end. Note that it is possible that the content of image Xdoes not match the content of any image in a k-d tree, in which case noimage in the k-d tree is identified.

The process described above with images X and D may be similarly appliedduring each recursive iteration as the search algorithm traverses downthe k-d tree. For example, suppose that during the second iteration,image X is to be compared with image A. The process described above maythen be applied to images X and A (i.e., with image A in the place ofimage D). The recursion may end either when a match is found betweenimage X and an image stored at a particular node in the tree or when aleaf node is reached since a leaf node has no child.

FIG. 5 illustrates an example computer system 500. In particularembodiments, one or more computer systems 500 perform one or more stepsof one or more methods described or illustrated herein (e.g., performingimage comparison). In particular embodiments, one or more computersystems 500 provide functionality described or illustrated herein. Inparticular embodiments, software running on one or more computer systems500 performs one or more steps of one or more methods described orillustrated herein or provides functionality described or illustratedherein. Particular embodiments include one or more portions of one ormore computer systems 500. Herein, reference to a computer system mayencompass a computing device, where appropriate. Moreover, reference toa computer system may encompass one or more computer systems, whereappropriate.

This disclosure contemplates any suitable number of computer systems500. This disclosure contemplates computer system 500 taking anysuitable physical form. As example and not by way of limitation,computer system 500 may be an embedded computer system, a system-on-chip(SOC), a single-board computer system (SBC) (such as, for example, acomputer-on-module (COM) or system-on-module (SOM)), a desktop computersystem, a laptop or notebook computer system, an interactive kiosk, amainframe, a mesh of computer systems, a mobile telephone, a personaldigital assistant (PDA), a server, a tablet computer system, or acombination of two or more of these. Where appropriate, computer system500 may include one or more computer systems 500; be unitary ordistributed; span multiple locations; span multiple machines; spanmultiple data centers; or reside in a cloud, which may include one ormore cloud components in one or more networks. Where appropriate, one ormore computer systems 500 may perform without substantial spatial ortemporal limitation one or more steps of one or more methods describedor illustrated herein. As an example and not by way of limitation, oneor more computer systems 500 may perform in real time or in batch modeone or more steps of one or more methods described or illustratedherein. One or more computer systems 500 may perform at different timesor at different locations one or more steps of one or more methodsdescribed or illustrated herein, where appropriate.

In particular embodiments, computer system 500 includes a processor 502,memory 504, storage 506, an input/output (I/O) interface 508, acommunication interface 510, and a bus 512. Although this disclosuredescribes and illustrates a particular computer system having aparticular number of particular components in a particular arrangement,this disclosure contemplates any suitable computer system having anysuitable number of any suitable components in any suitable arrangement.

In particular embodiments, processor 502 includes hardware for executinginstructions, such as those making up a computer program. As an exampleand not by way of limitation, to execute instructions, processor 502 mayretrieve (or fetch) the instructions from an internal register, aninternal cache, memory 504, or storage 506; decode and execute them; andthen write one or more results to an internal register, an internalcache, memory 504, or storage 506. In particular embodiments, processor502 may include one or more internal caches for data, instructions, oraddresses. This disclosure contemplates processor 502 including anysuitable number of any suitable internal caches, where appropriate. Asan example and not by way of limitation, processor 502 may include oneor more instruction caches, one or more data caches, and one or moretranslation lookaside buffers (TLBs). Instructions in the instructioncaches may be copies of instructions in memory 504 or storage 506, andthe instruction caches may speed up retrieval of those instructions byprocessor 502. Data in the data caches may be copies of data in memory504 or storage 506 for instructions executing at processor 502 tooperate on; the results of previous instructions executed at processor502 for access by subsequent instructions executing at processor 502 orfor writing to memory 504 or storage 506; or other suitable data. Thedata caches may speed up read or write operations by processor 502. TheTLBs may speed up virtual-address translation for processor 502. Inparticular embodiments, processor 502 may include one or more internalregisters for data, instructions, or addresses. This disclosurecontemplates processor 502 including any suitable number of any suitableinternal registers, where appropriate. Where appropriate, processor 502may include one or more arithmetic logic units (ALUs); be a multi-coreprocessor; or include one or more processors 502. Although thisdisclosure describes and illustrates a particular processor, thisdisclosure contemplates any suitable processor.

In particular embodiments, memory 504 includes main memory for storinginstructions for processor 502 to execute or data for processor 502 tooperate on. As an example and not by way of limitation, computer system500 may load instructions from storage 506 or another source (such as,for example, another computer system 500) to memory 504. Processor 502may then load the instructions from memory 504 to an internal registeror internal cache. To execute the instructions, processor 502 mayretrieve the instructions from the internal register or internal cacheand decode them. During or after execution of the instructions,processor 502 may write one or more results (which may be intermediateor final results) to the internal register or internal cache. Processor502 may then write one or more of those results to memory 504. Inparticular embodiments, processor 502 executes only instructions in oneor more internal registers or internal caches or in memory 504 (asopposed to storage 506 or elsewhere) and operates only on data in one ormore internal registers or internal caches or in memory 504 (as opposedto storage 506 or elsewhere). One or more memory buses (which may eachinclude an address bus and a data bus) may couple processor 502 tomemory 504. Bus 512 may include one or more memory buses, as describedbelow. In particular embodiments, one or more memory management units(MMUs) reside between processor 502 and memory 504 and facilitateaccesses to memory 504 requested by processor 502. In particularembodiments, memory 504 includes random access memory (RAM). This RAMmay be volatile memory, where appropriate Where appropriate, this RAMmay be dynamic RAM (DRAM) or static RAM (SRAM). Moreover, whereappropriate, this RAM may be single-ported or multi-ported RAM. Thisdisclosure contemplates any suitable RAM. Memory 504 may include one ormore memories 504, where appropriate. Although this disclosure describesand illustrates particular memory, this disclosure contemplates anysuitable memory.

In particular embodiments, storage 506 includes mass storage for data orinstructions. As an example and not by way of limitation, storage 506may include a hard disk drive (HDD), a floppy disk drive, flash memory,an optical disc, a magneto-optical disc, magnetic tape, or a UniversalSerial Bus (USB) drive or a combination of two or more of these. Storage506 may include removable or non-removable (or fixed) media, whereappropriate. Storage 506 may be internal or external to computer system500, where appropriate. In particular embodiments, storage 506 isnon-volatile, solid-state memory. In particular embodiments, storage 506includes read-only memory (ROM). Where appropriate, this ROM may bemask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM),electrically erasable PROM (EEPROM), electrically alterable ROM (EAROM),or flash memory or a combination of two or more of these. Thisdisclosure contemplates mass storage 506 taking any suitable physicalform. Storage 506 may include one or more storage control unitsfacilitating communication between processor 502 and storage 506, whereappropriate. Where appropriate, storage 506 may include one or morestorages 506. Although this disclosure describes and illustratesparticular storage, this disclosure contemplates any suitable storage.

In particular embodiments, I/O interface 508 includes hardware,software, or both providing one or more interfaces for communicationbetween computer system 500 and one or more I/O devices. Computer system500 may include one or more of these I/O devices, where appropriate. Oneor more of these I/O devices may enable communication between a personand computer system 500. As an example and not by way of limitation, anI/O device may include a keyboard, keypad, microphone, monitor, mouse,printer, scanner, speaker, still camera, stylus, tablet, touch screen,trackball, video camera, another suitable I/O device or a combination oftwo or more of these. An I/O device may include one or more sensors.This disclosure contemplates any suitable I/O devices and any suitableI/O interfaces 508 for them. Where appropriate, I/O interface 508 mayinclude one or more device or software drivers enabling processor 502 todrive one or more of these I/O devices. I/O interface 508 may includeone or more I/O interfaces 508, where appropriate. Although thisdisclosure describes and illustrates a particular I/O interface, thisdisclosure contemplates any suitable I/O interface.

In particular embodiments, communication interface 510 includeshardware, software, or both providing one or more interfaces forcommunication (such as, for example, packet-based communication) betweencomputer system 500 and one or more other computer systems 500 or one ormore networks. As an example and not by way of limitation, communicationinterface 510 may include a network interface controller (NIC) ornetwork adapter for communicating with an Ethernet or other wire-basednetwork or a wireless NIC (WNIC) or wireless adapter for communicatingwith a wireless network, such as a WI-FI network. This disclosurecontemplates any suitable network and any suitable communicationinterface 510 for it. As an example and not by way of limitation,computer system 500 may communicate with an ad hoc network, a personalarea network (PAN), a local area network (LAN), a wide area network(WAN), a metropolitan area network (MAN), or one or more portions of theInternet or a combination of two or more of these. One or more portionsof one or more of these networks may be wired or wireless. As anexample, computer system 500 may communicate with a wireless PAN (WPAN)(such as, for example, a BLUETOOTH WPAN), a WI-FI network, a WI-MAXnetwork, a cellular telephone network (such as, for example, a GlobalSystem for Mobile Communications (GSM) network), or other suitablewireless network or a combination of two or more of these. Computersystem 500 may include any suitable communication interface 510 for anyof these networks, where appropriate. Communication interface 510 mayinclude one or more communication interfaces 510, where appropriate.Although this disclosure describes and illustrates a particularcommunication interface, this disclosure contemplates any suitablecommunication interface.

In particular embodiments, bus 512 includes hardware, software, or bothcoupling components of computer system 500 to each other. As an exampleand not by way of limitation, bus 512 may include an AcceleratedGraphics Port (AGP) or other graphics bus, an Enhanced Industry StandardArchitecture (EISA) bus, a front-side bus (FSB), a HYPERTRANSPORT (HT)interconnect, an Industry Standard Architecture (ISA) bus, an INFINIBANDinterconnect, a low-pin-count (LPC) bus, a memory bus, a Micro ChannelArchitecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, aPCI-Express (PCIe) bus, a serial advanced technology attachment (SATA)bus, a Video Electronics Standards Association local (VLB) bus, oranother suitable bus or a combination of two or more of these. Bus 512may include one or more buses 512, where appropriate. Although thisdisclosure describes and illustrates a particular bus, this disclosurecontemplates any suitable bus or interconnect.

Herein, a computer-readable non-transitory storage medium or media mayinclude one or more semiconductor-based or other integrated circuits(ICs) (such, as for example, field-programmable gate arrays (FPGAs) orapplication-specific ICs (ASICs)), hard disk drives (HDDs), hybrid harddrives (HHDs), optical discs, optical disc drives (ODDs),magneto-optical discs, magneto-optical drives, floppy diskettes, floppydisk drives (FDDs), magnetic tapes, solid-state drives (SSDs),RAM-drives, SECURE DIGITAL cards or drives, any other suitablecomputer-readable non-transitory storage media, or any suitablecombination of two or more of these, where appropriate. Acomputer-readable non-transitory storage medium may be volatile,non-volatile, or a combination of volatile and non-volatile, whereappropriate.

Herein, “or” is inclusive and not exclusive, unless expressly indicatedotherwise or indicated otherwise by context. Therefore, herein, “A or B”means “A, B, or both,” unless expressly indicated otherwise or indicatedotherwise by context. Moreover, “and” is both joint and several, unlessexpressly indicated otherwise or indicated otherwise by context.Therefore, herein, “A and B” means “A and B, jointly or severally,”unless expressly indicated otherwise or indicated otherwise by context.

The scope of this disclosure encompasses all changes, substitutions,variations, alterations, and modifications to the example embodimentsdescribed or illustrated herein that a person having ordinary skill inthe art would comprehend. The scope of this disclosure is not limited tothe example embodiments described or illustrated herein. Moreover,although this disclosure describes and illustrates respectiveembodiments herein as including particular components, elements,functions, operations, or steps, any of these embodiments may includeany combination or permutation of any of the components, elements,functions, operations, or steps described or illustrated anywhere hereinthat a person having ordinary skill in the art would comprehend.Furthermore, reference in the appended claims to an apparatus or systemor a component of an apparatus or system being adapted to, arranged to,capable of, configured to, enabled to, operable to, or operative toperform a particular function encompasses that apparatus, system,component, whether or not it or that particular function is activated,turned on, or unlocked, as long as that apparatus, system, or componentis so adapted, arranged, capable, configured, enabled, operable, oroperative.

What is claimed is:
 1. A method comprising: by a computing device,receiving a set of one or more content objects to be blacklisted; by thecomputing device, retrieving a set of currently blacklisted contentobjects; by the computing device, determining a delta set of contentobjects comprising the content objects in the set of content objects tobe blacklisted that are not included in the set of currently blacklistedcontent objects, wherein each of the content objects of the delta set isrepresented as a vector comprising a plurality of first elements; by thecomputing device, retrieving, for each content object of a third set ofcontent objects, a representation of the content object as a vectorcomprising a plurality of second elements; by the computing device,identifying each content object in the third set whose contentsubstantially matches at least one content object of the delta set basedon a determination as to whether calculated differences between thefirst elements and the corresponding second elements is less than apre-determined threshold.
 2. The method of claim 1, wherein the set ofcurrently blacklisted content objects comprises one or more ofcategories, and wherein the identification corresponds to a comparisonof one or more of stored images to one or more of the categories.
 3. Themethod of claim 1, wherein the content objects are images and theidentification is performed using an image-matching algorithm.
 4. Themethod of claim 3, wherein the image-matching algorithm is a discretewaveform transformation, singular value decomposition, or feature pointbased image hashing.
 5. The method of claim 1, wherein the set ofcontent objects to be blacklisted comprise an updated blacklist.
 6. Themethod of claim 1, wherein the plurality of first elements representscontent of the currently blacklisted content objects or the contentobjects to be blacklisted, and wherein the plurality of second elementsrepresents content of content objects stored on a social-networkingsystem.
 7. The method of claim 1, wherein the third set of contentobjects is stored in a k-dimensional tree.
 8. The method of claim 7,wherein: the k-dimensional tree comprises a root node and a plurality ofsub-trees connected to the root node; and the plurality of sub-treescomprises a plurality of nodes.
 9. The method of claim 8, whereinidentifying each content object comprises identifying one of thesub-trees for a subsequent comparison based at least in part on adifference between a first element corresponding to a current node ofthe k-dimensional tree and a second element corresponding to the currentnode.
 10. The method of claim 9, wherein identifying each content objectfurther comprises eliminating content objects of one or moreunidentified sub-trees from the identification based at least in part onthe difference between the first element corresponding to the currentnode of the k-dimensional tree and the second element corresponding tothe current node being more than the pre-determined threshold.
 11. Themethod of claim 10, further comprising discarding a sub-tree ofk-dimensional tree that corresponds to eliminated content objects. 12.The method of claim 8, wherein identifying each content objectcomprises: calculating a difference between a first elementcorresponding to the root node and a second element corresponding to theroot node; and calculating a difference between a first elementcorresponding to a child node of the root node a second elementcorresponding to the child node, wherein the child node is identifiedbased on the calculated difference of the root node.
 13. The method ofclaim 7, wherein each node of the k-dimensional tree stores the vectorrepresenting content of one of the content objects of the third set. 14.The method of claim 7, wherein: each second element corresponds to alevel of the k-dimensional tree; and each content object of the thirdset is sorted within the k-dimensional tree based on a value of eachsecond element.
 15. One or more computer-readable non-transitory storagemedia embodying software that is operable when executed to: receive aset of one or more content objects to be blacklisted; retrieve a set ofcurrently blacklisted content objects; determine a delta set of contentobjects comprising the content objects in the set of content objects tobe blacklisted that are not included in the set of currently blacklistedcontent objects, wherein each of the content objects of the delta set isrepresented as a vector comprising a plurality of first elements;retrieve, for each content object of a third set of content objects, arepresentation of the content object as a vector comprising a pluralityof second elements; identify each content object in the third set whosecontent substantially matches at least one content object of the deltaset based on a determination as to whether calculated differencesbetween the first elements and the corresponding second elements is lessthan a pre-determined threshold.
 16. The media of claim 15, wherein thethird set of content objects is stored in a k-dimensional tree.
 17. Themedia of claim 16, wherein: the k-dimensional tree comprises a root nodeand a plurality of sub-trees connected to the root node; and theplurality of sub-trees comprises a plurality of nodes.
 18. A systemcomprising: one or more processors; and a memory coupled to theprocessors comprising instructions executable by the processors, theprocessors operable when executing the instructions to: receive a set ofone or more content objects to be blacklisted; retrieve a set ofcurrently blacklisted content objects; determine a delta set of contentobjects comprising the content objects in the set of content objects tobe blacklisted that are not included in the set of currently blacklistedcontent objects, wherein each of the content objects of the delta set isrepresented as a vector comprising a plurality of first elements;retrieve, for each content object of a third set of content objects, arepresentation of the content object as a vector comprising a pluralityof second elements; identify each content object in the third set whosecontent substantially matches at least one content object of the deltaset based on a determination as to whether calculated differencesbetween the first elements and the corresponding second elements is lessthan a pre-determined threshold.
 19. The system of claim 18, wherein thethird set of content objects is stored in a k-dimensional tree.
 20. Thesystem of claim 19, wherein: the k-dimensional tree comprises a rootnode and a plurality of sub-trees connected to the root node; and theplurality of sub-trees comprises a plurality of nodes.